Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6049: Misc. hygiene and code cleanup changes #1085

Closed
wants to merge 1 commit into from

Conversation

paul-rogers
Copy link
Contributor

Collection of "hygiene" changes arising from the vector column writer project. This represents the set of changes outside of the core mechanism so that reviews can be done in multiple PRs rather than a single monster PR that mixes generic and core work.

@paul-rogers
Copy link
Contributor Author

Error Handling

The column writer work introduces a new scan operator mechanism. Part of that mechanism is to standardize error handling. Part of that work is, in turn, to add a a couple new error types to UserError to better categorize errors.

Star Column

Organized our many references to the "star column" to make it more readily available to the new scan framework.

Date/Time Utilities

The various date/time vectors have very handy methods to convert the vector representation into Joda objects. This PR pulls that code out into a utility class that can be used in other places. This change uncovered a flaw in JDBC: that the existing date/time utilities were not included in the JDBC jar, causing failures. This PR fixes that.

Deprecate OptionSet

In a previous set of PRs we've been assembling a sub-operator test framework. This PR continues that work by deprecating the prior OptionSet class. (OptionSet was meant to work around limitations in the original OptionManager, but those issues have since been resolved by all the great work that Jyothsna and Tim have done. Since OptionSet now no longer serves a useful purpose, this PR removes it.)

Deprecate StatsWriter

Similarly, the StatsWriter class added a stats mechanism that can be used in sub-operator tests. Further experience has shown we can make the full OperatorStats work in this context, so StatsWriter is deprecated in this PR.

Fix for Close of Sort Operator

A bit of code was reshuffled to close the operator context in the correct order in the sort operator in both production and unit tests.

Swap Containers

Added code to swap buffers between two containers. Builds on code added earlier to exchange buffers between two vectors. (Used in scan operator, to be added later.)

CSV Column Header Error Handling

Changed CSV to use UserException to report errors instead of an ad-hoc HeaderError exception.

Test Tools

Add a number of services to the ClusterFixture family of test classes. The key new feature is the ability to read the results of a multi-batch query as a sequence of RowSet objects so we can use the row set tools to verify results.

Union Vector

The union vector is a rather odd duck: it works, but is not widely supported in Drill. It stores its "type" members in two ways: in an internal map and in member variables. The member variable format makes it hard to treat the types generically. This PR keeps the serialized format (storing vectors in an internal map) but changes the member variable format to use an array of type vectors rather than individual variables. The result is that generic access (for the new Union column writer) is much easier.

Materialized Field in Vectors

Vectors provide a MaterializedField to hold their metadata. The original idea seems to be that a MaterializedField is immutable. This worked great for simple scalar types. But, code evolved to change the MaterializedField for maps (to add map members), for Unions (to add union members) and so on.

For container types (Map, Repeated Map, Union, List, Repeated List), the "parent" MaterializedField holds onto a copy of the child field. But, some parents, when they receive new members, throw away their old MaterializedField and create a new one. The result is that, if we have the structure map(map(a, b)), when we add b to the inner map, the parent ends up pointing to an older version of the child schema, not the updated one.

This PR fixes at least some of those issues in the Union vector.

JUnit Upgrade

Upgrades JUnit from 4.11 to 4.12 to make use of a handy annotation that disables timeouts when debugging.

Generic Cleanup

Added comments, fixed indentation and other minor changes.

*/

@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated means that we might remove it in future. But since you mention that it is still ok to use in certain cases, I am not sure that such annotation is appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Merge error. It is deprecated in master; I have undeprecated it in my branch...

return copy1.equals(copy2);
}

public static String typeKey(MinorType type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this method? It looks like it is never used? If it's intended for future use please add java doc then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -54,8 +56,7 @@
static MemWatcher memWatcher;
static String className;

@Rule public final TestRule TIMEOUT = TestTools.getTimeoutRule(100000);

@Rule public final TestRule TIMEOUT = new DisableOnDebug(TestTools.getTimeoutRule(100_000));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great enhancement!

@Override
@VisibleForTesting
public long getLongStat(MetricDef metric) {
return longMetrics.get(metric.metricId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metric is absent, will it return 0 or fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably fail. But, since this used in testing to ensure that a long metric was, in fact, written, failure is actually a useful result.

@@ -38,20 +38,26 @@
import org.apache.drill.exec.server.options.SystemOptionManager;
import org.apache.drill.exec.store.sys.store.provider.LocalPersistentStoreProvider;
import org.apache.drill.exec.util.GuavaPatcher;
import org.apache.drill.test.BaseDirTestWatcher;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting from this class, github does not highlight Java code. I remember @vdiravka had similar problem, it turned out that he copied code from text editor with some hidden symbols. Please check and fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the source. No hidden characters. For me, formatting does appear below this line.

I suspect your browser hit some limit. Large PRs make Safari struggle to display the code. Suggestion: refresh the page. search for this file, and keep going. Perhaps, if you don't expand prior files, your browser will handle the formatting. (GitHub should provide some solution for large PRs...)

Copy link
Contributor Author

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arina-ielchiieva, thank you for your kind review. Comments were addressed. Looks like GitHub UI prevented you from reviewing some of the files. See the suggestions inline and please try again.

*/

@Deprecated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Merge error. It is deprecated in master; I have undeprecated it in my branch...

return copy1.equals(copy2);
}

public static String typeKey(MinorType type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Override
@VisibleForTesting
public long getLongStat(MetricDef metric) {
return longMetrics.get(metric.metricId());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably fail. But, since this used in testing to ensure that a long metric was, in fact, written, failure is actually a useful result.

@@ -38,20 +38,26 @@
import org.apache.drill.exec.server.options.SystemOptionManager;
import org.apache.drill.exec.store.sys.store.provider.LocalPersistentStoreProvider;
import org.apache.drill.exec.util.GuavaPatcher;
import org.apache.drill.test.BaseDirTestWatcher;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the source. No hidden characters. For me, formatting does appear below this line.

I suspect your browser hit some limit. Large PRs make Safari struggle to display the code. Suggestion: refresh the page. search for this file, and keep going. Perhaps, if you don't expand prior files, your browser will handle the formatting. (GitHub should provide some solution for large PRs...)

@arina-ielchiieva
Copy link
Member

@paul-rogers, hope this is browser issue but just in case please take a look at UnionVector class to confirm that everything is fine. Also please resolve merge conflicts. Other than that, +1, LGTM.

@amansinha100
Copy link

@paul-rogers since the DRILL-3993 (Calcite related changes) went into master last week, this PR would need to be rebased. I can merge it in soon after that.

@paul-rogers
Copy link
Contributor Author

@amansinha100, rebased on master. Ran unit tests. Please run pre-commit tests and let me know if anything is amiss. Once I hear the tests pass, I'll mark this as ready-to-commit. Thanks.

@amansinha100
Copy link

Sounds good. I will run the pre-commit tests. Since this PR was previously in a ready-to-commit stage and only needed a rebase, it is fine to mark it ready-to-commit.

@asfgit asfgit closed this in e791ed6 Jan 24, 2018
reudismam pushed a commit to reudismam/drill that referenced this pull request Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants